Variant Discovery    ◾    137

-Xmx4g ApplyBQSR \

-I ${i}.bam \

-R ${ref} \

--bqsr-recal-file ../BQSR/${i}.table \

-O ../applyBQSR/${i}.bqsr.bam

done

cd ..

4.2.2.2.10  Variant calling

After BQSR of BAM files, we can perform variant calling on each sample using the

“HaplotypeCaller” GATK4 function, which identifies the variation active regions, con-

structs possible haplotypes using de Bruijn-like graph, and then uses Bayesian theory to

call variants as described above. HaplotypeCaller will generate a Genome Variant Call

Format (gVCF) file for each sample. The gVCF stores sequencing information for both

variant and non-variant sites on a genome sequence. It can hold representation of geno-

type, annotation, and other information across all sites in the genome in a compact format.

Storing sample variant in gVCF format will make consolidation of variants across samples

easy.

The following script uses HaplotypeCaller to generate gVCF file for each sample. Notice

that since we are targeting only chromosome 21, we will use “-L chr21” option to restrict

variant calling to that chromosome. Also notice that chromosome label may be different

(e.g., 21, chromosome21); therefore, view the BAM file to check the right chromosome

names. The GATK4 GenotypeGVCFs tool is used to generate gVCFs.

mkdir gvcf

cd applyBQSR

ref=$(ls ../refgenome/*.fasta)

for i in $(ls *.bam|rev|cut -c 5-|rev);

do

~/software/gatk-4.2.3.0/gatk \

--java-options \

-Xmx10g HaplotypeCaller \

-I ${i}.bam \

-R ${ref} \

-L chr21 \

-ERC GVCF \

-O ../gvcf/${i}.g.vcf.gz

done

cd ..

4.2.2.2.11  Consolidating variants across samples

The above script used HaplotypeCaller to generate gVCF file for each sample. The next step

is to use GenomicsDBImport tool to import single-sample gVCFs into GenomicsDB and to

use GenotypeGVCFs tool to consolidate variants across the sample in a single VCF file. For

GenomicsDBImport, the input gVCF file is passed through “-V” option. For multiple gVCF